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ABSTRACT 

Objectives The aim of this paper is to summarize 
concerns with the de-identification standard and 
methodologies established under the Health Insurance 
Portability and Accountability Act (HIPAA) regulations, 
and report some potential policies to address those 
concerns that were discussed at a recent workshop 
attended by industry, consumer, academic and research 
stakeholders. 

Target audience The target audience includes 
researchers, industry stakeholders, policy makers and 
consumer advocates concerned about preserving the 
ability to use HIPAA de-identified data for a range of 
important secondary uses. 

Scope HIPAA sets forth methodologies for de-identifying 
health data; once such data are de-identified, they are no 
longer subject to HIPAA regulations and can be used for 
any purpose. Concerns have been raised about the 
sufficiency of HIPAA de-identification methodologies, the 
lack of legal accountability for unauthorized re- 
identification of de-identified data, and insufficient public 
transparency about de-identified data uses. Although 
there is little published evidence of the re-identification of 
properly de-identified datasets, such concerns appear to 
be increasing. This article discusses policy proposals 
intended to address de-identification concerns while 
maintaining de-identification as an effective tool for 
protecting privacy and preserving the ability to leverage 
health data for secondary purposes. 



INTRODUCTION 

Health information collected initially for treat- 
ment or payment purposes by healthcare 
providers and health insurers has high value for 
other important, 'secondary' purposes, including 
quality improvement, medical research, public 
health, and business analytics. 1 2 The ability to 
access this information from most healthcare 
providers and health plans is governed by the 
Health Insurance Portability and Accountability 
Act (HIPAA) privacy regulations (the privacy 
rule). 3 But the privacy rule sets standards only for 
identifiable health information; information that 
qualifies as 'de-identified' under the privacy rule is 
not subject to HIPAA regulations. 4 Other federal 
and state health privacy rules typically also apply 
only to identifiable information. Consequently, 
de-identified data are in high demand for a broad 
range of secondary purposes. 

The HIPAA de-identification standards have been 
controversial since their inception in 2000, and 
those concerns have increased in the past few years. 
Such concerns fall into three categories: (1) suffi- 
ciency of de-identification methodologies; (2) lack 
of accountability for unauthorized or inappropriate 



re-identification; and (3) disapproval of certain uses 
of de-identified data. 

The American Recovery and Reinvestment Act 
of 2009 requires the Department of Health and 
Human Services (HHS) to issue a report on the 
HIPAA de-identification standard. 5 In response, 
HHS held 2 days of meetings on de-identification in 
March 2010, 6 but the report had not yet been 
issued when this article was submitted for publi- 
cation. Questions continue to linger about the 
protective value of HIPAA de-identification, while 
demands for these data increase. In 2011 the USA 
launched a major federal incentive programme 
designed to increase the use of electronic medical 
records by healthcare providers. One goal of the 
programme is to enhance the quality and efficiency 
of the healthcare system, which will require greater 
access to health information for analytics purposes. 
Failure to address concerns about the de-identifica- 
tion standard effectively could hamper efforts to 
leverage health information more robustly for 
health system improvements. 

The Center for Democracy & Technology (CDT) 
began exploring concerns about HIPAA de-identifi- 
cation back in 2009 7 and gathered approximately 
50 academic, industry and consumer stakeholders 
together at a small, non-public October 2011 
workshop to vet policy ideas for addressing these 
concerns. This paper reports on this workshop and 
explores some of the promising policy proposals in 
more detail. 

Concerns have also been raised about the use of 
personal information by commercial enterprises 
operating in the internet and mobile space. 8 Such 
entities are typically not covered by HIPAA and are 
not required to comply with its de-identification 
standards. Discussions of anonymization in those 
contexts are not covered in this paper, and 
mentions of 'de-identified' data herein refer to 
information de-identified per HIPAA. 

CONCERNS ABOUT HIPAA DE-IDENTIFICATION: 
FROM 2000 TO THE PRESENT 

The 1996 HIPAA statute authorized HHS to 
develop rules to protect 'individually identifiable 
health information' accessed, used or disclosed by 
'covered entities' (most healthcare providers, health 
plans, and health clearinghouses) and their 
contractors or business associates. 9 10 Under 
HIPAA, information is identifiable if it identifies the 
individual or there is 'reasonable basis to believe 
that the information can be used to identify the 
individual'. 11 Consequently, the privacy rule defines 
de-identified data as 'health information that does 
not identify an individual and with respect to 
which there is no reasonable basis to believe that the 
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information can be used to identify an individual'. Once data 
qualify as de-identified, they are no longer regulated by HIPAA 
and can be used for any purpose, without restriction. 

In summary, the privacy rule established two methodologies 
for achieving de-identification: the 'statistical' (or expert) 
method, requiring a qualified statistician to attest that the data 
raise very low risk of re-identification, and the 'safe harbor' 
method, which requires the removal of 18 types of identifiers. 
Both methodologies were included in the original privacy rule, 
finalized in December 2000, 13 and have largely remained the 
same for more than a decade (see box 1). 

Concerns about the de-identification standard were raised 
during its initial development, and they are remarkably similar 
to concerns that continue to be expressed now. After the stan- 
dard was initially proposed, HHS received comments stating 
that 'no record of information about an individual can be truly 
de-identified' and 'all such information should be treated and 
protected as identifiable because more and more information 
about individuals is being made available to the public... that 
would enable someone to match and identify records that 
otherwise appear to be not identifiable'. 13 In response, HHS 
expressly acknowledged the inability to de-identify to a level of 
zero risk but noted that the statutory standard envisions 'a 



Box 1 Current HIPAA methodologies for de-identification 
and limited datasets 



The statistical method requires that someone with 'appropriate 
knowledge of and experience with generally accepted statistical 
and scientific principles and rendering information not individually 
identifiable' must determine that 'the risk is very small that the 
information could be used, alone or in combination with other 
reasonably available information, by an anticipated recipient to 
identify an individual who is a subject of the information'. 14 The 
safe harbor option requires the removal of 18 specific data 
elements that could uniquely identify an individual, including 
names, all elements of dates more specific than a year, and most 
address information (except the initial three digits of a zip code in 
certain circumstances). 15 In addition, the data holder must not 
have 'actual knowledge' that the information in the de-identified 
dataset could be used to identify an individual subject. 16 
The safe harbor provides the easiest and most certain path to de- 
identification; however, because it requires the removal of 
precise dates and specific geographical information, it is often 
less useful for certain secondary purposes, such as health 
services research and syndromic surveillance. To respond to 
researchers concerns that the safe harbor standard resulted in 
data of limited utility for research purposes, the privacy rule was 
amended in 2002 to provide for the use of a limited dataset for 
healthcare operations, research or public health purposes. 17 To 
quality as a limited dataset, 1 6 categories of identifiers must be 
removed 18 ; however, identifiers often deemed important for 
healthcare research, such as full dates and more specific 
geographical information, may be retained. Such information is 
considered to be 'identifiable' and therefore is regulated by the 
privacy rule; however, it can be accessed, used or disclosed for 
these purposes without the need to obtain subject consent or 
authorization (or a waiver thereof). 18 The recipient of the data is 
required to execute a data use agreement that sets the param- 
eters for use of the data and prohibits re-identification of the 
subjects. 18 



reasonable balance between the risk of identification and 
usefulness of the information'. 13 

HHS further declined to deem de-identification status only to 
information raising zero risk of re-identification because this 
would 'preclude many laudable and valuable uses'... while 
imposing 'too great a burden on less sophisticated covered 
entities to be justified by the small decrease in the already small 
risk of identification'. 13 Instead, HHS made certain that the 
easiest path to de-identification, the safe harbor standard, 
removed the data elements — date of birth and zip code — used by 
Sweeney 19 to identify the records of Massachusetts Governor 
William Weld from a publicly available database of state 
employee hospital records, believed to be the first published 
incident of a re-identification of a healthcare database presumed 
to be anonymous. 20 

Concerns about the de-identification standard were increased 
by highly publicized re-identifications of presumed 'anonymous' 
personal information posted on the internet. 21 22 The informa- 
tion in these incidents was not required to meet HIPAA de- 
identification standards nor any other legal or agreed-upon 
scientific standard requiring a very low risk of re-identification. 
These incidents, in combination with Sweeney's previous 
re-identification work, were cited by Ohm in a 2010 article 
concluding that 'in the past 15 years, computer scientists have 
established... the easy re-identification result', proving that the 
notion that data can be de-identified to zero privacy risk is 'deeply 
flawed' and should be rejected as a 'privacy-providing panacea'. 20 

More recent articles have challenged the premise of the 'easy 
re-identification result', at least with respect to data de-identified 
by HIPAA standards. For example, a 2011 review by El Emam 
et al 23 of 14 published re-identification attacks revealed that only 
six of the attacks were on health data, and only one of those was 
on data de-identified by HIPAA or similarly rigorous standards. 
The attack on HIPAA de-identified data had a very low 
re-identification rate of 0.013%. 23 

Other recent studies have focused specifically on the suffi- 
ciency of the safe harbor method, which presumes that removal 
of 18 specific data elements will continue to ensure a very small 
risk of re-identification, notwithstanding an ever-changing data 
ecosystem. One study focused on the sufficiency of the safe 
harbor method in light of the potential availability of voter 
registration data for linking purposes. (Sweeney used voter 
registration records to find Governor Weld in the hospital 
dataset.) The study concluded that the re-identification risk of 
the safe harbor method was likely to vary by location (due to 
differences in population distributions of US states), the 
potential linking variables contained within voter registries of 
various states, and varying access policies for voter registries. 24 
The safe harbor has also been criticized as providing insufficient 
protection for datasets containing information at higher risk for 
re-identification but currently not required to be removed (such 
as some genetic information, longitudinal data, transactional 
data such as diagnosis codes, and free-form text). 25 

HHS in 2010 commissioned a study of the re-identification 
risk of a dataset compliant with the safe harbor. The study, 
involving admission records of Hispanic individuals in one 
hospital system between 2004 and 2009 and a hypothetical 
intruder with access to substantial market research information, 
re-identified only two of 15 000 individuals. 26 This re-identifi- 
cation was the sole successful example of re-identification in 
a HIPAA de-identified dataset identified in the 2011 review by El 
Emam et al. 23 

Concerns about HIPAA de-identification were raised by 
interested parties in the 2011 US Supreme Court case of Sorrell v. 



30 



J Am Med Inform Assoc 2013;20:29-34. doi:1 0.1 136/amiajnl-201 2-000936 



Review 



IMS Health, Inc. 27 The case involved a challenge to a Vermont 
statute prohibiting the use of de-identified data for the 
marketing of brand name drugs. The statute's primary aim was 
controlling healthcare costs driven by the prescription 
of branded drugs. The state also claimed the need to 
protect patient privacy although the statute relied solely on 
HIPAA de-identification to protect patient identities. In briefs 
submitted to the court, some academics and privacy adv- 
ocates criticized the HIPAA de-identification standard and some 
of the de-identification techniques purportedly used by the 
data mining companies involved in the case; they urged the 
court to uphold the Vermont statute as an important 
patient confidentiality measure. 28 The court declined to use 
the case as an opportunity to critique HIPAA de-identification, 
instead declaring the Vermont statute to be uncon- 
stitutional because it targeted marketing uses of data by phar- 
maceutical manufacturers, violating their commercial free 
speech rights. 29 Some harshly criticized the decision 
for its failure to address the privacy of the doctor— patient 
relationship. 30 31 

THE VALUE OF DE-IDENTIFIED DATA 

Notwithstanding concerns about the de-identification standard, 
it is critically important that HIPAA and other health privacy 
laws maintain a distinction between fully identifiable and de- 
identified data. 7 32 If privacy laws do not recognize this 
distinction, there will be no incentive for entities to expend 
resources to de-identify data and learn to work with them or to 
improve de-identification techniques. Instead, there will be 
a tendency to use fully identifiable data for secondary purposes 
when it is legally permissible, such as for public health and 
quality improvement, raising far greater privacy risk for indi- 
viduals. In addition, pressure will increase to make identifiable 
data available to meet commercial data needs that currently rely 
on de-identified data. Re-identification may still be possible with 
de-identified data — but when de-identification is done properly, 
re-identification should not be easy or cheap. 24 33 

Now is the time to consider appropriate policies to address 
concerns about HIPAA de-identification. Delaying response until 
after a well-publicized re-identification could lead to policy that 
is more reactionary than reflective. HHS recognized in estab- 
lishing the de-identification standard that there would still be 
some risk of re-identification. 13 Consequently, establishing 
policy that regulates even the low risk of re-identification makes 
sense. 

POLICIES TO BUILD TRUST IN DE-IDENTIFIED DATA 

The workshop convened by CDT in October 2011 included 
companies engaged in creating and/or using HIPAA de-identified 
data, academic experts on statistical de-identification, healthcare 
lawyers, and consumer advocates. CDT selected attendees 
because of their scholarship on de-identification, their experience 
in de-identifying data, their involvement in de-identification 
policy issues, and their interest in preserving privacy-protective 
mechanisms for using data to improve individual and population 
health. At the workshop, CDT gathered feedback on the 
following potential policy options, which were initially put 
forth by CDT in a 2009 white paper 7 : 

1. Prohibiting unauthorized re-identification of de-identified 
data; 

2. Ensuring the robustness of de-identification methodologies; 

3. Establishing reasonable security safeguards for de-identified 
data; 



4. Increasing public transparency about uses of de-identified 
data. 

Each is discussed in more detail below. 
Prohibiting unauthorized re-identification 

Although there is currently little publicly available evidence 
that re-identification of HIPAA de-identified datasets is 
a common occurrence, the potential for re-identification — and 
the lack of accountability for those who do it — will be 
a persistent concern that, if not addressed, could create obsta- 
cles to more widespread uses of de-identified data. One solution 
is to hold individuals and entities legally accountable for 
unauthorized re-identification of de-identified datasets. 
Such policies would need to apply to recipients of de-identified 
data who are not HIPAA covered entities; they are permitted 
to re-identify and use health data consistent with the 
privacy rule. 34 

A few workshop participants expressed concern that prohib- 
iting re-identification would force such activity further under- 
ground, making it more difficult to detect. 35 However, a number 
of workshop attendees favored policies establishing account- 
ability for unauthorized re-identification in order to build public 
trust in uses of de-identified data. The Institute of Medicine has 
also recommended that re-identification without authorization 
be subject to legal sanctions. 36 

Accountability for unauthorized re-identification can be 
accomplished in the following two ways: (1) through legislation 
prohibiting recipients of de-identified data from unauthorized 
re-identification of the information; and (2) by requiring HIPAA- 
covered entities (and business associates) to obtain agreements 
with recipients of de-identified data that prohibit the informa- 
tion from being re-identified without authorization. 

Both options are likely to require action by Congress, as HHS 
believes HIPAA does not give the department the power to 
regulate information that is not individually identifiable. 13 
Furthermore, HIPAA coverage does not extend beyond covered 
entities and entities performing services on their behalf. 

The first option has the advantage of potentially achieving 
more widespread coverage, including of health databases that 
may not currently be required to meet HIPAA de-identification 
standards and public use datasets, when acquiring enforceable 
agreement of data recipients not to re-identify may be a chal- 
lenge. However, the second option may be easier to implement, 
as some workshop attendees noted that covered entities 
frequently already require de-identified data recipients to agree 
contractually not to re-identify. 

Contractual provisions can be effective when the contracting 
parties choose to enforce them; however, it is also possible to create 
a 'hybrid option' in which contracts can be enforced by regulators 
or third parties. For example, Gellman 37 has developed model 
legislation, the Personal Data Deidentification Act, which would 
allow parties to a de-identified data agreement to opt into having it 
subject to enforcement by authorities or even by data subjects. 

Legal prohibitions against re-identification may need to include 
exemptions to accommodate re-identification research — ie, 
attempts to re-identify intended to test how effectively a dataset 
is 'de-identified' — or possibly to allow re-identification of indi- 
viduals for urgent health reasons (although covered entities de- 
identifying data already may include a re-identification code that 
they can use for this purpose). Such exemptions may be difficult 
to draft in legislation, and may need to be managed through the 
issuance of regulations or guidance by HHS. The contractual 
option would allow the parties to specify the narrow instances 
when re-identification would be permitted; however, to enable 
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consistent national policy on this issue, legislation or regulation 
could set more clear rules on when re-identification is permissible, 
and by whom. 

Any law prohibiting re-identification will need to include 
a clear definition of that term; it may also be possible to protect 
re-identification research through a carefully crafted definition 
that focuses on actually identifying individuals. Gellman 37 
defines re-identification as 'the linkage of deidentified personal 
information with an overt identifier which belongs or is assigned 
to a living or dead individual'. One workshop attendee suggested 
re-identification could be defined as an attempt to link data to 
categories of identifiers required to be excluded as part of a HIPAA 
limited dataset (see box 1 for an explanation of a limited dataset). 

Ensuring robustness of de-identification methodologies 

HHS created the methodologies for de-identification in regula- 
tion, so concerns about their sufficiency can be addressed 
without further legislation. Workshop participants largely 
agreed that strengthening these methodologies — coupled with 
accountability for re-identification — could considerably ease 
concerns about de-identification. 

Since its inception the safe harbor methodology has been 
criticized for, among other things, failing to account for data 
recipients' potential access to information that can be used to re- 
identify and their motivation to re-identify. But HHS has 
rejected calls to rely solely on the statistical methodology, noting 
that they intended to create an easy to follow, 'cookbook' 
approach to de-identification, which could be used by entities 
without access to a statistician: '[t]he Safe Harbor is intended to 
involve a minimum of burden and convey a maximum of 
certainty that the rules have been met...'. 13 Of note, when the 
safe harbor method was initially proposed, covered entities were 
required to have 'no reason to know' the recipient could 
potentially re-identify the data in order for it to qualify for safe 
harbor status. However, covered entities did not want to be 
legally liable for failing to estimate correctly what linking 
information a data recipient might be able to access. Conse- 
quently, in the final rule, HHS took the guesswork out of the 
standard and required only that covered entities not have 'actual 
knowledge' of re-identification possibilities. 13 

It is unlikely HHS would agree to eliminate the safe harbor; 
furthermore, its ease of use and policy imperatives to encourage 
data to be used in least identifiable form counsel against elimi- 
nation. The more palatable option is for HHS to review the safe 
harbor standard regularly such as biennially, to ensure it 
continues to provide a very low risk of re-identification. In 
addition, if the current safe harbor proves to be vulnerable in 
certain contexts, its use could be precluded in those contexts. 

The statistical methodology, which at least requires express 
consideration of other information that the recipient may be 
able to use to re-identify individuals, has also been criticized. Its 
viability depends on the quality of the statistical analysis, and 
there are currently no independent, objective mechanisms for 
vetting statistical analyses. Some have also argued that the 'very 
small risk' standard is too vague. 13 

In the final privacy rule, HHS recognized that entities 
choosing the statistical method of de-identification might need 
guidance to help them confidently achieve the 'very small risk' 
standard. HHS expressly listed two sources on statistical 
disclosure of confidential information published by the US 
Office of Management and Budget. 13 HHS acknowledged that 
for guidance on statistical techniques to be valuable, HHS would 
need to update it to keep up with technology and the avail- 
ability of public information from other sources. 13 HHS 



committed to providing said such updated guidance in the 
future, but as of May 2012, it has not done so. 

A number of workshop attendees supported further explora- 
tion of the following policy options, aimed at strengthening 
both de-identification methodologies: 

► HHS could create a process for objectively vetting or setting 
standards for the statistical methodology, to provide some 
assurance to the public that the methodologies meet the very 
low risk standard. Most were supportive of having the 
techniques and practices used in the statistical methodology 
vetted by the federal government, using statistical experts at 
the National Institute of Standards and Technology, the 
National Center for Health Statistics, and/or the Census 
Bureau, to establish trust and a level playing field. However, 
private sector entities that operate with full transparency, 
objectivity, and accountability might aptly fill the vetting role. 

► The current safe harbor standard should be reviewed on 
a regular basis. Such review could also be done by the 
'objective vetters' suggested above to bolster the statistical 
methodology. In addition, safe harbor status — and its legal 
certainty as a de-identification methodology — could be 
extended to those statistical methodologies that satisfy the 
objective vetting process described above. This has the 
potential of adding reliable recipes to the de-identification 
'cookbook', increasing their use and potentially reducing the 
cost of statistical de-identification. Although the current safe 
harbor applies to all recipients for all use scenarios, it is 
possible that safe harbor status should be granted only in 
circumstances in which a methodology has been demon- 
strated to achieve the very low risk standard. Any new 
methodology blessed with safe harbor status would also need 
to be reviewed on a consistent basis. 

► HHS could also explore certifying or accrediting entities that 
regularly de-identify data or evaluate re-identification risk; 
those whose methodologies pass the objective vetting criteria 
established above would then be deemed as certified or 
accredited. Such 'centers of de-identification excellence' 
would need to be re-certified or accredited on a regular 
basis. Again, the 'objective vetters' deployed to review 
statistical and safe harbor de-identification methodologies 
could play a role in designing and potentially implementing 
or overseeing the implementation of a certification or 
accreditation system. Such certification/accreditation could 
begin as a voluntary initiative, on the theory that most 
health data mining companies would seek it to demonstrate 
trustworthiness to customers and the public; mandates could 
be imposed if voluntary initiatives fail or when circumstances 
require a higher level of trust. 

► HHS should consider whether strengthening the safe harbor 
standard is sufficient to protect information in public use 
datasets. This could be particularly important if effective re- 
identification prohibitions for these data are not achievable. If 
re-identification risk depends on the motivation of the data 
recipient, and potential access to other information that can 
facilitate re-identification, it is more difficult to consider those 
risks with a dataset open to the public. 

Workshop attendees provided feedback on a few other ideas. 
For example, should there be retention limits on de-identified 
datasets or a requirement to refresh them after a period of time 
to help ensure they continue to raise a very low risk of re- 
identification? Furthermore, some thought HHS should explore 
creating mathematical standards establishing what constitutes 
'very small risk' of re-identification for different datasets and 
different purposes. However, others expressed concern that such 
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a standard would be impossible to calculate reliably, given that 
re-identification risk is contextual. Both of these ideas, as well as 
the recommendations above, require further exploration before 
being implemented as policy. 

Reasonable security safeguards 

Workshop attendees generally agreed with the idea that 
reasonable information security safeguards should protect all 
health information, and such safeguards should be commensu- 
rate with the privacy risks posed by the data. Consequently in 
the case of de-identified data, the degree of security required need 
not be as robust as that needed to protect identifiable data or 
data at greater risk of re-identification. For example, the HIPAA 
security rule requires protections for all electronic protected 
(identifiable) health information. 38 

If security safeguards should be commensurate with the risk 
posed by the data, data holders probably need some flexibility to 
determine appropriate security measures to adopt. At a minimum, 
holders of de-identified data should be held accountable for 
adopting security measures that protect against prohibited 
re-identification or ensure that commitments can be honored 
with respect to re-identification or limiting the particular uses of 
de-identified data. For example, if de-identified data are released by 
a covered entity for research purposes only, the recipient should 
adopt appropriate physical and technical safeguards to ensure 
access is limited to those conducting the research. Implementing 
security safeguards for public use datasets will be a particular 
challenge. 

Transparency to the public 

Distinct from concerns about the potential risks of de-identified 
data to confidentiality are concerns about actual uses of de- 
identified data. The Vermont statute in Sorrell restricted the use 
of de-identified data (see box 2) for pharmaceutical marketing 
purposes; similar statutes had been enacted in Maine and New 
Hampshire. 27 Concerns have also been raised about de-identified 
data informing business decisions in ways that could have a 
negative impact on patients. 39 FICO recently launched a 'medi- 
cation adherence score' tool that purports to use de-identified 
data to predict whether patients will adhere to medication 
regimens. 40 Officials from FICO claimed the information could 
not be used by insurers for risk adjustment purposes 41 ; never- 
theless, the news alarmed a number of consumer advocates. 42 43 
Many uses of de-identified data currently occur with little 
public knowledge, and this lack of transparency contributes to 
public concerns about health data de-identification. 7 35 Greater 
transparency about the de-identification standard, as well as on 
uses (and users) of de-identified data, could help build trust in 
uses of de-identified data; it could also help uncover concerns 
about uses of de-identified data that may be addressable 
through more direct policy. However, workshop attendees 
noted that effective outreach to the public on this issue will be 
a challenge; individuals often do not have good knowledge even 
of the uses and disclosures of identifiable health data. 50 HHS 
could consider funding pilot projects to increase transparency 
of de-identified data or tying federal funding or favorable 
regulatory treatment — such as safe harbor status for de- 
identification methodologies or Center of De-Identification 
Excellence status — to greater public transparency about 
de-identification and uses of de-identified data. 

CONCLUSION 

Increasing concerns about re-identification risks could 
erode trust in the HIPAA de-identification standard. But de- 



Box 2 Should individuals have the right to consent to uses 
of de-identified data? 



Even in circumstances in which the information raises a very low 
risk of re-identification, some individuals have heightened sensi- 
tivity regarding uses of health information about them. 44 A 2010 
survey by the California Healthcare Foundation found that 'a 
majority of adults express discomfort (42 per cent) or uncertainty 
(25 per cent) with their health information being shared with 
other organizations — even if... [their] name, address, [date of 
birth and social security number] were not included'. 45 Some 
have suggested that individuals should have the right to consent 
to — or at least be able to opt out of — having their data included 
in de-identified datasets. 35 39 

The surveys are not consistent on this issue. A survey released 
by the Markle Foundation in 201 1 found that at least 68% of the 
public, and 75% of doctors expressed willingness to 'allow 
composite information to be used to detect outbreaks, bio-terror 
attacks, and fraud, and to conduct research and quality and 
service improvement programs' 46 Markle noted that this result 
was consistent with a similar survey it conducted in 2006. 
Many workshop attendees expressed concern that allowing 
individuals to consent (either opt in or out) to being included in 
de-identified datasets treats de-identified data as though they 
raise the same risk as identifiable data, providing a disincentive to 
de-identify. In addition, consent in practice too often provides 
weak privacy protection, suggesting that relying on it as 
a mechanism to give individuals a voice in how de-identified data 
are used would probably not be very effective 47 48 Even more 
importantly, before implementing any policies requiring consent 
for uses of de-identified data, policy makers should consider the 
literature exploring whether consent requirements introduce 
selection biases into de-identified datasets, potentially distorting 
the accuracy of results, increasing the costs, and lengthening the 
time for conducting scientific research and healthcare quality 
assessments with de-identified data 49 
Promoting greater transparency of uses of de-identified data may 
be a more promising way to build public trust in de-identified data 
uses. 

identification, if done correctly, provides an important tool for 
privacy protection while preserving data utility for uses critical 
to advancing a more effective and efficient healthcare system. 
Policies to reduce the risk of re-identification, encourage use of 
strong de-identification methods and practices, and enhance 
public awareness of uses of de-identified data could help ease 
concerns and build public trust in a more robust health data 
ecosystem. 
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